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ABSTRACT 

The Kepler mission is providing photometric data of exquisite quality for the aster- 
oseismic study of different classes of pulsating stars. These analyses place particular 
demands on the pre-processing of the data, over a range of timescales from minutes to 
months. Here, we describe processing procedures developed by the Kepler Asteroseis- 
mic Science Consortium (KASC) to prepare light curves that are optimized for the 
asteroseismic study of solar-like oscillating stars in which outliers, jumps and drifts 
are corrected. 
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1 INTRODUCTION 

The primary scientific objective of the NASA Kepler Mis- 
sion is to look for Earth-like planets in habitable zones of 
solar-like stars through the observation of photometric tran- 
sits for at least 3.5 years ( |Borucki et al.|[2010| |Koch et al.| 
2010p . Launched on March 7, 2009 (UTC), Kepler continu- 
ously monitors about 150,000 stars in a single field of view 
(FOV) of 115 deg 2 located in the constellation of Cygnus 
that was selected to provide the optimal density of stars. 

Kepler is located in a 372.5-day, Earth-trailing, helio- 
centric orbit. This requires the satellite to perform 90° rolls 
about its axis every 93 days to keep the solar panels illumi- 
nated and the radiator, which cools the focal-plane arrays, 
pointed away from the Sun ( Haas et al.| 2010). Data are con- 
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sequently subdivided into quarters (denoted Qn or Qn.m, 
where n is the quarter number and m, the month), starting 
with the initial 10-day-long commissioning run (Q0), fol- 
lowed by a 34-day-long first quarter (Ql) and subsequent 
three-month-long quarters (Q2, Q3,...). 

The high precision of the differential photometry car- 
ried out by Kepler makes it an ideal instrument to perform 
asteroseismic studies -in which long and continuous observa- 
tions are needed- as part of the Kepler Asteroseismic Inves- 
tigation (KAI, [Giliiland et al.|2010a| [Kjeldsen et aLpno] ). 
Thus, the study of the resonant modes propagating inside a 
star complements the main scientific objectives of the Ke- 
pler Mission by characterizing the stars which potentially 
host planets (e.g. Moya et al.||2010 Christensen-Dalsgaard 
et al.|[2010 Gaulme et al.||2010 | and their influence on the 
habitable zones, for example, due to their magnetic activity 
(e.g. |Mosser et al.|2005| |2009[ |Karoff et al.|2009| |Mathur et 
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Figure 1. Raw (black), PDC c orrected (blue) and corrected -using the procedure described in this paper (red)— light curves of the 
solar-like target: KIC 11395018 jMathur et "al~]|201l| . The corrected light curve has been shifted down, by 4 10 5 e /cadence, for the 
clarity of the comparison. The origin of the time axis is in Modified Julian dates (MJD) - 55000. The points in which the fluxes fall 
abruptly are mostly due to momentum-dump operations. LOFP stands for: Loss Of Fine Pointing. 



al. 2010 Garcia et al. 20101. Indeed, the properties of the 



eigenmodes depend on the internal structure and dynamics 
of the star in such a way that the fundamental stellar prop- 
erties, such as their masses, radii, and ages, can be inferred 
directly to levels that would be difficult to obtain by other 



more classical methods (e.g. Stello et al.||2009 Metcalfe et 



al.|2010 Kallinger et al.|2010 Creevey et al. in preparation 



Moreover, the unprecedented number of stars showing oscil- 
lations (covering most of the HR diagram), that are being 



observed by Kepler (e.g. Bedding et al. 2010 Chaplin et 



al. 2010 Grigahcene et al. 2010| |Stello et al. 20101 as well 



as the French-led Convection Rotation and planetary Tran- 
sits (CoRoT) satellite (e.g. Hekker et al.|2009 Garcia et al. 



2009 Deheuvels et al. 20101, will soon modify our view of 



stellar evolution through the new constraints that we will be 
able to impose on the physical processes occurring in their 
interiors (e.g. |Miglio et al.|2009| |2010| |Chaplin et al.|20"TT |. 

For each star, two types of light curves are available to 
the Kepler Asteroseismic Science Consortium (KASC) for 
asteroseismic investigations through the Kepler Asteroseis- 
mic Science Operations Center (KASOC) database (http: 
//kasoc .phys . au.dk/): on one hand, raw time series suf- 
fering from some instrumental perturbations; on the other 
hand, corrected light curves in which housekeeping data have 
been used to minimize those instrumental perturbations. 
These second data sets are produced during the Pre-search 
Data Conditioning (PDC), enabling the search for exoplanet 
transits ( Jenkins et al.|20~10 i. While these PDC datasets are 
in a constant evolution and new and more refined proce- 
dures are established, we found that, in some cases, part of 
the low-frequency stellar signal (such as the one produced by 
long-lived starspots ( Croll et al.|2006 l, short-lived starspots 
( Mosser et al.|2009 1 or low frequency modes) could be modi- 
fied. Therefore, for solar-like oscillating stars as well as some 
classical pulsators (5-Scuti and T-Doradus stars), we decided 
to take the raw datasets and develop our own methods to 
correct for these perturbations. 



2 KEPLER OBSERVATIONS AND STELLAR 
LIGHT CURVES 

Kepler observations are made in two different operat- 
ing modes. Long cadence (LC) targets are sampled every 



29.4244 minutes (Nyquist frequency of 283.45 /iRz) includ- 
ing all targets for exoplanets research for which signatures 
of photometric transits are sought. For the brightest stars 
(down to Kepler magnitude, Kp « 12), short cadence (SC) 
observations can be obtained with a faster sampling rate of 
58.84876 s (Nyquist frequency of ~ 8.5 mHz), allowing for 
more precise transit timing. However, due to telemetry lim- 
itations, this running mode is only available for 512 targets. 
In both cases, the integration time is set to 6.02 s with a 
readout time of 0.52 s. The time is stamped in a way that 
the mid-time of each cadence is known with an accuracy of ± 
0.050 s ( jGilliland et al.|2010b| |. Verification of this intended 
accuracy of the timing has not yet been done. 

The light curves of the 150,000 stars are quasi- 
continuously recorded and stored on board the spacecraft. 
However, this data acquisition suffers from episodic data 
breaks due to operational procedures. Apart from the afore- 
mentioned spacecraft rolling, data acquisition is also inter- 
rupted once each month to download the stored science and 
engineering information. Indeed, the spacecraft has to be 
reoriented to point its high-gain antenna towards the Earth 
stopping the scientific data collection. Finally, every three 
days, one or more reaction wheels approach their maxi- 
mum operating angular velocity. In order to desaturate the 
wheels, the spacecraft fires its thrusters losing its attitude 
precise pointing for a few minutes. Once this operation is fin- 
ished, Kepler returns to its normal fine-pointing mode but 
several targets could suffer from degraded pointing perfor- 
mance during these events. In general, about one data point 
in LC and several data points in SC are affected during 
each desaturation. Another problem related to the momen- 
tum management is when one of the reaction wheels crosses 
zero angular velocity. When this happens, the affected wheel 
rumbles and degrades the pointing on timescales of a few 
minutes. The primary consequence is an increased noise level 
in the SC centroid time series, with a resulting increase in 
noise in the pixel and flux time series, (a more detailed in- 
formation can be found in Haas et al.|2010| . Figure [I] shows 
a typical example of the Q2 raw light curve of a solar-like 
star (black), the PDC-corrected (blue) and the one we have 
corrected (red) , in which all these interruptions can be seen. 

The nominal timeline of Kepler science data collection 
has also been interrupted a few times due to unexpected 
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events such as attitude tweaks, pointing drifts, periods of 
spacecraft jitter excess, loss of fine pointing (LOFP) events, 
etc. The most important interruptions - as judged by their 
lengths - are the so-called safe-mode events. This special 
mode tends to protect the spacecraft after an unanticipated 
response to a bad sequence of commands or due to a prob- 
lem in the on-board electronics after being hit by cosmic 
radiation. As an example, the longest safe-mode event oc- 
curred during the fourth roll (Q4) and lasted almost four 
days. During the second roll, a safe-mode event induced a 
two-day interruption (see Fig. [TJ. The data collected after 
resumption of science data collection show a trend, which is 
strongly correlated with the warm-up of the local detector 
electronics. Depending on the CCD in which a star is ob- 
served, this effect manifests itself as a rising or decreasing 
trend in the light curve lasting for a few days (see Fig[T|. 



3 CORRECTING THE INSTRUMENTAL 
PERTURBATIONS 

To correct the raw light curves we follow a phenomenologi- 
cal approach in which we correct three types of effects: out- 
liers, jumps, and drifts (no matter their physical origin). 
By doing so, we try to preserve, as much as possible, the 
low-frequency signal present in the data keeping in mind 
that some thermal or other long-term instrumental effects 
could still remain in the light curves. However, in the case 
of KASC Working Group 2 ("Stars in clusters"), we have 
compared all the stars corrected with this pipeline with the 
standard procedure (PDC light curves, Jenkins et al.|20T0 |, 
adopting the corrections that performed best for every sin- 
gle target. Finally, we do not take into account the fraction 
of light thought to come from nearby stars. Therefore, in 
some cases, the amplitudes will be diluted. 



3.1 Outliers 

We have considered as outliers in the datasets the individ- 
ual measurements showing a point-to-point deviation in the 
two-point difference function of the light curve greater than 
3a for the SC and 5a for the LC, where a is defined as 
the standard deviation of the two-point difference function. 
Most of the points affected by this clipping are those ob- 
served during momentum desaturation maneuvers as well 
as during periods when the reaction wheels cross zero an- 
gular velocity. This correction also removes points affected 
by the Argabrightening effect, named after its discover by V. 
Argabright (Van Clevc 2009). These points have an ampli- 
tude of many standard deviations above the average and its 
origin is not completely understood yet. It seems that they 
might be due to small dust particles from Kepler achieving 
escape velocity after micrometeorite hits and reflecting sun- 
light into the barrel of the telescope as they drift across the 



FOV (further explanations can be found in Jenkins et al 
|2010[ ). The outlier correction removes about one percent of 
the data points. The deleted points are written in the data 
file as "-Inf" (see also appendix A). In Fig. [I] we can see how 
all the outlier points seen in the raw light curve (black) were 
removed in the corrected one (red curve). 



3.2 Jumps 

Jumps are defined as sudden changes in the mean value of 
the light curve due to, for example, attitude tweaks or sud- 
den drops in pixel sensitivity. The light curves have been 
checked at every cadence for these sudden changes by com- 
paring the mean flux of one-day-long segments. When the 
difference between two adjacent segments is larger than a 
certain threshold, an additive correction is applied (multi- 
plicative for the red-giants working group), i.e., adding or 
subtracting the difference in the average levels of the light 
curve segments with respect to the first part of the light 
curve. The threshold has been defined as five times the dif- 
ference of the mean flux values of adjacent segments. Note 
that we always check for jumps at known times of satellite 



attitude changes as noted by Van Cleve ( 2009 1 



The definition of the one-day segments in the jump cor- 
rection only allows corrections in the light curve from the 
second day of measurements till the penultimate day of the 
run. There are no automatic algorithms in place to detect 
and correct for jumps in the first and last day of the quar- 
ter. These parts of the time series are inspected by eye and 
manually corrected if necessary. 

An example of these corrections is shown in Fig. [l] in 
which 3 attitude tweaks produced a discontinuity in the light 
curve, two of them (the first and third) were flagged as jumps 
and corrected. The second attitude tweak was considered as 
a drift (see the next subsection). 



3.3 Drifts 

Drifts are small low-frequency perturbations, which are in 
general due to temperature changes (e.g., after a safe-mode 
event) and lasting for a few days. This correction is based on 
the software developed to correct the high- voltage perturba- 
tions in the GOLF/SoHO instrument (Ga rcia et al.| |2005). 
It consists of fitting a 2nd or 3rd order polynomial function 
to the region where a thermal drift has been observed af- 
ter comparing several light curves of the same roll. Then, 
the fitted polynomial is subtracted in the affected portion 
of the light curve and we add another polynomial function 
(1st or 2nd order) — used as a reference — fitted to a local 
non-perturbed region of the time series which includes data 
before and after the discontinuity. If the correction has to be 
applied on one border of the time series, only one side of the 
light curve is used to compute the reference function. After 
the correction, there is a manual validation of the result. 

In Fig. [T] we can see this type of correction applied to 
the first few days of the quarter (Q2.1), during the safe mode 
event that also occurred in Q2.1, as well as during the sec- 
ond attitude tweak in which only the second segment (the 
beginning of Q2.3 data) has been modified. 



4 MERGING LIGHT CURVES OBSERVED IN 
MORE THAN ONE QUARTER 

Once the corrections are applied to the raw data, a single 
combined light curve has been constructed for each star ob- 
served for a period longer than a month. The mean flux lev- 
els of a star can differ considerably from quarter to quarter, 
because after rolling the satellite, stars are observed using 
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Figure 2. Raw (black) and corrected (red) light curves of the red giant target: KIC 1161618. The origin of the time axis is in Modified 
Julian dates (MJD) - 55000. The inset plot shows the residual light curve after applying the 10 days filter in the post-processing phase. 
The vertical dashed lines separate the different quarters. 



another CCD module with different characteristics (aper- 
ture, crowding metrics, etc). Moreover, sometimes changes 
in the average flux inside a quarter can also happen. In the 
cases reported up to now, this problem is due to a bad defini- 
tion of the optimal aperture for bright targets that saturate 



three or more pixels (K epler magnitude Kp ^ 11, Van Cleve 



20091. With the progress of the mission most of the stars 



showing these aperture problems have been identified and 
their apertures improved, thereby minimizing this effect. 

To implement the corrections at quarter boundaries, the 
first quarter for which data have been obtained for a given 
star is used as a reference. Depending on the processing of 
SC or LC time series, a different approach has been followed. 

In the case of the SC data of KASC Working Group 1 
("Solar-like oscillating stars"), only a few stars have been 
observed longer than a month because it has been decided 
to perform a survey during the first year of Kepler scien- 
tific operations. During this survey, stars would be observed 
for a month. Thus, only six solar-like stars showing a p- 
mode hump were observed since the beginning of the mission 
(four of them are deeply analyzed in Campante et al.|2011 



|Mathur et al.|2011| ). To merge these data sets, we computed 
the mean value of segments of the light curve (each only one 
day long) at the start and end of each quarter, and we cor- 
rected for the difference using QO data as the reference. The 
resultant light curve showed a smoothed junction between 
all the quarters in the six stars. 

For LC data the correction procedure is slightly differ- 
ent. As we have processed more than a thousand stars in 
KASC Working Group 8 ("Red giants") and several hun- 
dreds (630 up to Q4) in Working Groups 4 and 10 ("delta- 
Scuti" and "gamma Doradus stars", respectively), we took 
into account that, in some cases, there was a slope at the 
quarter edges due to temperature gradients. Therefore, we 
computed the mean flux values of the last two days of a 
quarter and the first two days of the next one as well as first 
order polynomial fits through the same segments of the light 
curve. Then, we added (or subtracted) the mean difference in 
flux to the second light curve based on either the mean val- 
ues or the polynomials. Finally, for both solutions the light 
curves were merged and a polynomial was fitted through the 
four day segment of the merged light curve spanning the last 
two days of the first quarter and the first two days of the 
second one. The solution with the lowest \ 2 was used for the 



final merged light curve. An example is shown in Fig. [2] for 
the red-giant target KIC 1161618. While some long periods 
are still present in the light curve, the discontinuities at the 
quarter edges disappears. 

The final merged light curve is cut back into the in- 
dividual quarters and saved as an extension of the original 
files available at KASOC (see Appendix A). 



5 POST-PROCESSING OF REMAINING 

LOW-FREQUENCY SIGNATURES IN THE 
LIGHT CURVES 

Although we made an effort to correct for most of the 
instrumental effects and to retain the long-term (rota- 
tion/granulation signal) and short-term (oscillations) fea- 
tures in the light curves, we suspect that some instrumental 
effects remain in the corrected merged light curves (see also 
Appendix B). Therefore, in some cases an additional filter 
should be applied to take into account these instrumental 
effects. The details of this filtering can differ for different 
types of stars and different scientific aims. As an example, 
in the case of red giants, we chose to perform an additional 
triangular smoothing with a filter 10 days wide (equivalent 
to 2 passes of rectangular smoothing) on the light curves. 
To avoid the influence of gaps on this smoothing, we apply 
it to an interpolated light curve into a regular grid of points 
and then we recompute the filtered signal into the Kepler 
timing. This smoothing removes the signals with timescales 
longer than 10 days, such as trends due to CCD degradation 
(see the inset plot in Fig. [2j|. This means that we can inves- 
tigate the granulation, i.e., the signature of large convection 
cells present in the turbulent outer atmosphere of low-mass 
main-sequence stars and red giants, up to a timescale of 10 
days (Mathur et al. in preparation, Mosser et al. in prepara- 
tion). After removing the granulation signature, oscillations 
can be investigated. These have timescales of minutes for 
main-sequence stars up to hours or even a few days in red 
giants. 



6 KNOWN ARTIFACTS IN THE PSD 

The power spectrum of the Kepler data suffers from some 
perturbations at given frequencies (mostly on SC data). The 
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first spurious peaks to report are the harmonics of the 1 /LC 
frequency at multiples of 566.4 fiRz that appears in the SC 
spectrum. The effect is bigger (more harmonics) for fainter 
stars, and it seems to be produced by the built-in electronics 
I Gillil and et al.|2010b l. A second group of peaks appears at 
constant frequencies at very high frequency in SC spectra 
around 7024, 7444, 7865, and 8286 fj,Uz, which are separated 
by 421 fiRz (40 minutes). Finally, another two appears at 
5017 and 5584 /iHz. All of them remain of unknown origin. 

In the data obtained up to the second quarter, there 
are some peak-structures in the range 80 to 95 /iHz that 
are similar to some asteroseismic signatures due to the non- 
sinusoidal nature of the perturbation (e.g. |Hekker et al.| 
2011 1. These peaks are related to the variation of the reac- 



tion wheel housing temperature. They have been eliminated 
by reducing the corresponding temperature controller dead- 
band ( Van Cleve|2009[ ). There is also another peak-structure 
around 200 to 400 /xHz with associated artifacts between 
~500 to 530 /iHz that shift in frequency with time, for a yet 
unknown reason (Antochi, private communication). 

Finally, spurious peaks have been found at 3 days, re- 
lated to the momentum management cycle and its associ- 
ated temperatures, and around 4500 /iHz, occurring mostly 
for stars with a moderate activity signal. 



7 CONCLUSIONS 

In this work we have described the corrections applied 
to the Kepler asteroseismic raw light curves to mini- 
mize some known instrumental effects and to produce 
the working group corrected files (labeled "_wg#" in the 
KASOC database, where denotes for the working group 
number). Three main phenomenological effects have been 
treated: outliers, jumps and drifts. In the case of time se- 
ries longer than a month, a correction has been applied to 
smooth the discontinuities at the edges of the quarters. Fi- 
nally, we have explained the structure of the data files and 
the known spurious frequencies that appear in the power 
spectrum of the asteroseismic targets. 



APPENDIX A: FILE STRUCTURE OF 
WORKING GROUP CORRECTED DATA 

The corrected light curves are saved in an ASCII file with 
the same name as the original one but adding "_wg#"', and 
uploaded into the KASOC database (a complete description 
can be found in Handberg & Kjeldsen, KASC User Require- 
ments Specifications: Working Group Corrected Data). 

The structure of the file is the same as the original one 
but adding two extra columns containing the working group 
(wg) corrected flux and the wg corrected errors. For the 
moment, this last column is just a copy of the raw flux error. 
The points flagged as outliers are written as "-Inf. 

The new file header contains some new information. A) 
An extra line is added below the first one to identify the 
file as being processed by the working group. This second 
line is of the form: Working Group N Corrected data 
by <Name>" where N is the number of the working group 
and <Name> is the name of the person who processed the 
new data file. B) The version number is changed adding a 



".YY" where YY denotes the version number of the correc- 
tion software (currently version 1). We also add the creation 
date of the file in parenthesis after a blank space. Thus, the 
line looks like: "# Version: 2.1 (1 Oct 2010)". C) In the line 
describing the columns of the file we add: " WG# Corrected 
Flux, WG # Corrected error" . 
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